Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings
نویسنده
چکیده
Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in the orthography of the second language. We introduce a statistical approach for transliteration only using the bilingual resources released in the shared task and without any previous knowledge of the target languages. Mapping the Transliteration problem to the Machine Translation problem, we make use of the phrase based SMT approach and apply it on substrings of names. In the English to Russian task, we report ACC (Accuracy in top-1) of 0.545, Mean F-score of 0.917, and MRR (Mean Reciprocal Rank) of 0.596. Due to time constraints, we made a single experiment in the English to Chinese task, reporting ACC, Mean F-score, and MRR of 0.411, 0.737, and 0.464 respectively. Finally, it is worth mentioning that the system is language independent since the author is not aware of either languages used in the experiments.
منابع مشابه
Transliteration using phrase based SMT approach on substrings
Translation of named entities (NEs), such as person names, organization names and location names is crucial for cross lingual information retrieval, machine translation, and many other natural language processing applications. Newly named entities are introduced on daily basis in newswire and this greatly complicates the translation task. Named Entities translation between languages having diff...
متن کاملIntegrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System
The system presented in this paper is based upon a phrase-based statistical machine transliteration (SMT) framework. The SMT system’s log-linear model is augmented with a set of features specifically suited to the task of transliteration. In particular our model utilizes a feature based on a joint source-channel model, and a feature based on a maximum entropy model that predicts target grapheme...
متن کاملEnglish-Korean Named Entity Transliteration Using Statistical Substring-based and Rule-based Approaches
This paper describes our approach to English-Korean transliteration in NEWS 2011 Shared Task on Machine Transliteration. We adopt the substring-based transliteration approach which group the characters of named entity in both source and target languages into substrings and then formulate the transliteration as a sequential tagging problem to tag the substrings in the source language with the su...
متن کاملEnglish-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کاملHindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کامل